Search CORE

23 research outputs found

Naturaren kume ezkutuak

Author: Azkune Galparsoro Gorka
Publication venue: Servicio Editorial de la Universidad del País Vasco/Euskal Herriko Unibertsitatearen Argitalpen Zerbitzua
Publication date: 01/01/2004
Field of study

Artikulu honetan zulo beltzak ditugu aztergai. Astro bitxi horiek behar bezala ulertzeko erlatibitate orokorra ezinbestekoa denez, artikuluaren lehen orrietan teoria horren inguruko ideia nagusienak aztertzen dira. Segidan, izarren heriotza hartzen da hizpide. Izar handienak zulo beltz bihurtzen direla ikusiko dugu. Zulo beltzek dauzkaten hainbat ezaugarri harrigarri ere lantzen dira artikuluan. Atal horretan zulo beltzen lurrinketak hartzen du lekurik garrantzitsuena. Bertan, kuantikaren oinarrizko kontzeptu batzuk azaldu ostean, zulo beltzak ez direla hain beltzak erakusten da. Azkenik, singularitateen mundu ilunean murgilduko da irakurlea. Dauden argi-izpi bakan horien atzetik joaten saiatuko gara, eta ameslarienentzat, denbora-makinen inguruko xehetasun batzuk ere landuko dira

Archivo Digital para la Docencia y la Investigación

Egocentric Vision-based Action Recognition: A survey

Author: Arganda Carreras Ignacio
Azkune Galparsoro Gorka
Núñez Marcos Adrián
Publication venue: 'Elsevier BV'
Publication date: 01/02/2022
Field of study

[EN] The egocentric action recognition EAR field has recently increased its popularity due to the affordable and lightweight wearable cameras available nowadays such as GoPro and similars. Therefore, the amount of egocentric data generated has increased, triggering the interest in the understanding of egocentric videos. More specifically, the recognition of actions in egocentric videos has gained popularity due to the challenge that it poses: the wild movement of the camera and the lack of context make it hard to recognise actions with a performance similar to that of third-person vision solutions. This has ignited the research interest on the field and, nowadays, many public datasets and competitions can be found in both the machine learning and the computer vision communities. In this survey, we aim to analyse the literature on egocentric vision methods and algorithms. For that, we propose a taxonomy to divide the literature into various categories with subcategories, contributing a more fine-grained classification of the available methods. We also provide a review of the zero-shot approaches used by the EAR community, a methodology that could help to transfer EAR algorithms to real-world applications. Finally, we summarise the datasets used by researchers in the literature.We gratefully acknowledge the support of the Basque Govern-ment's Department of Education for the predoctoral funding of the first author. This work has been supported by the Spanish Government under the FuturAAL-Context project (RTI2018-101045-B-C21) and by the Basque Government under the Deustek project (IT-1078-16-D)

Archivo Digital para la Docencia y la Investigación

Combining Users' Activity Survey and Simulators to Evaluate Human Activity Recognition Systems

Author: Almeida A.
Azkune Gorka
Chen Liming
Lopez-de-Ipina Diego
Publication venue: 'MDPI AG'
Publication date: 27/03/2015
Field of study

Open Access articleEvaluating human activity recognition systems usually implies following expensive and time-consuming methodologies,where experiments with humans are run with the consequent ethical and legal issues. We propose a novel evaluation methodology to overcome the enumerated problems, which is based on surveys for users and a synthetic dataset generator tool. Surveys allow capturing how different users perform activities of daily living, while the synthetic dataset generator is used to create properly labelled activity datasets modelled with the information extracted from surveys. Important aspects, such as sensor noise, varying time lapses and user erratic behaviour, can also be simulated using the tool. The proposed methodology is shown to have very important advantages that allow researchers to carry out their work more efﬁciently. To evaluate the approach, a syntheticdatasetgeneratedfollowingtheproposedmethodologyiscomparedtoarealdataset computing the similarity between sensor occurrence frequencies. It is concluded that the similarity between both datasets is more than signiﬁcant

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

PubMed Central

De Montfort University Open Research Archive

Embedding-based real-time change point detection with application to activity segmentation in smart home time series data

Author: Almeida Aitor
Azkune Galparsoro Gorka
Bermejo Fernández Unai
Bilbao Jayo Aritz
Publication venue: 'Elsevier BV'
Publication date: 15/12/2021
Field of study

[EN]Human activity recognition systems are essential to enable many assistive applications. Those systems can be sensor-based or vision-based. When sensor-based systems are deployed in real environments, they must segment sensor data streams on the fly in order to extract features and recognize the ongoing activities. This segmentation can be done with different approaches. One effective approach is to employ change point detection (CPD) algorithms to detect activity transitions (i.e. determine when activities start and end). In this paper, we present a novel real-time CPD method to perform activity segmentation, where neural embeddings (vectors of continuous numbers) are used to represent sensor events. Through empirical evaluation with 3 publicly available benchmark datasets, we conclude that our method is useful for segmenting sensor data, offering significant better performance than state of the art algorithms in two of them. Besides, we propose the use of retrofitting, a graph-based technique, to adjust the embeddings and introduce expert knowledge in the activity segmentation task, showing empirically that it can improve the performance of our method using three graphs generated from two sources of information. Finally, we discuss the advantages of our approach regarding computational cost, manual effort reduction (no need of hand-crafted features) and cross-environment possibilities (transfer learning) in comparison to others.This work was carried out with the financial support of FuturAALEgo (RTI2018-101045-A-C22) granted by Spanish Ministry of Science, Innovation and Universities

Archivo Digital para la Docencia y la Investigación

Do Multilingual Language Models Think Better in English?

Author: Artetxe Mikel
Azkune Gorka
de Lacalle Oier Lopez
Etxaniz Julen
Soroa Aitor
Publication venue
Publication date: 02/08/2023
Field of study

Translate-test is a popular technique to improve the performance of multilingual language models. This approach works by translating the input into English using an external machine translation system, and running inference over the translated input. However, these improvements can be attributed to the use of a separate translation system, which is typically trained on large amounts of parallel data not seen by the language model. In this work, we introduce a new approach called self-translate, which overcomes the need of an external translation system by leveraging the few-shot translation capabilities of multilingual language models. Experiments over 5 tasks show that self-translate consistently outperforms direct inference, demonstrating that language models are unable to leverage their full multilingual potential when prompted in non-English languages. Our code is available at https://github.com/juletx/self-translate

arXiv.org e-Print Archive

Improving Explicit Spatial Relationships in Text-to-Image Generation through an Automatically Derived Dataset

Author: Agirre Eneko
Azkune Gorka
de Lacalle Oier Lopez
Keller Frank
Salaberria Ander
Soroa Aitor
Publication venue
Publication date: 01/03/2024
Field of study

Existing work has observed that current text-to-image systems do not accurately reflect explicit spatial relations between objects such as 'left of' or 'below'. We hypothesize that this is because explicit spatial relations rarely appear in the image captions used to train these models. We propose an automatic method that, given existing images, generates synthetic captions that contain 14 explicit spatial relations. We introduce the Spatial Relation for Generation (SR4G) dataset, which contains 9.9 millions image-caption pairs for training, and more than 60 thousand captions for evaluation. In order to test generalization we also provide an 'unseen' split, where the set of objects in the train and test captions are disjoint. SR4G is the first dataset that can be used to spatially fine-tune text-to-image systems. We show that fine-tuning two different Stable Diffusion models (denoted as SD

_{SR4G}

) yields up to 9 points improvements in the VISOR metric. The improvement holds in the 'unseen' split, showing that SD

_{SR4G}

is able to generalize to unseen objects. SD

_{SR4G}

improves the state-of-the-art with fewer parameters, and avoids complex architectures. Our analysis shows that improvement is consistent for all relations. The dataset and the code will be publicly available.Comment: 12 pages and 5 figure

arXiv.org e-Print Archive

Image captioning for effective use of language models in knowledge-based visual question answering

Author: Agirre Bengoa Eneko
Azkune Galparsoro Gorka
López de Lacalle Lecuona Oier
Salaberria Saizar Ander
Soroa Echave Aitor
Publication venue: 'Elsevier BV'
Publication date: 01/02/2023
Field of study

Integrating outside knowledge for reasoning in visio-linguistic tasks such as visual question answering (VQA) is an open problem. Given that pretrained language models have been shown to include world knowledge, we propose to use a unimodal (text-only) train and inference procedure based on automatic off-the-shelf captioning of images and pretrained language models. More specifically, we verbalize the image contents and allow language models to better leverage their implicit knowledge to solve knowledge-intensive tasks. Focusing on a visual question answering task which requires external knowledge (OK-VQA), our contributions are: (i) a text-only model that outperforms pretrained multimodal (image-text) models of comparable number of parameters; (ii) confirmation that our text-only method is specially effective for tasks requiring external knowledge, as it is less effective in standard a VQA task (VQA 2.0); and (iii) our method attains results in the state-of-the-art when increasing the size of the language model. We also significantly outperform current multimodal systems, even though augmented with external knowledge. Our qualitative analysis on OK-VQA reveals that automatic captions often fail to capture relevant information in the images, which seems to be balanced by the better inference ability of the text-only language models. Our work opens up possibilities to further improve inference in visio-linguistic tasks.Ander is funded by a PhD grant from the Basque Government (PRE_2021_2_0143). This work is based upon work partially supported by the Ministry of Science and Innovation of the Spanish Government (DeepKnowledge project PID2021-127777OB-C21), and the Basque Government (IXA excellence research group IT1570-22)

Archivo Digital para la Docencia y la Investigación